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I.  OVERVIEW 


This  annual  report  covers  the  second  year  of  research  carried  out  under  Contract 
DAAG29-84-K-0024.  The  research  is  part  of  the  Joint  Services  Electronics  Program  and  is 
administered  by  the  U.  S.  Army  Research  Office.  Research  activities  are  concentrated  in 
two  areas:  (1)  Two-Dimensional  Signal  Processing  and  Storage,  and  (2)  Theory  and 
Application  of  Electromagnetic  Measurements. 

The  research  in  two-dimensional  signal  processing  and  storage  is  concentrated  in  five 
major  areas.  These  areas  overlap  and  the  research  activities  interact  and  reinforce  one 
another.  Research  in  Work  Unit  Number  1,  Multidimensional  Digital  Signal  Processing,  is 
concerned  with  the  theory,  design,  and  implementation  of  digital  signal  representations 
and  digital  signal  processing  algorithms  and  systems.  Work  Unit  Number  2,  Multiprocessor 
Architectures  for  Digital  Signal  Processing,  focusses  upon  hardware  and  software  problems 
in  the  use  of  multiport  memories  and  multiple  processors  for  high-speed  implementations  of 
digital  signal  processing  algorithms.  The  research  in  Work  Unit  Number  3,  Two-Dimensional 
Optical  Storage  and  Processing,  is  concerned  with  problems  of  using  holographic 
information  storage  as  the  basis  for  multiport  memories  and  for  optical  computation.  Work 
Unit  Number  4,  Two-Dimensional  Optical! Electronic  Signal  Processing,  is  concerned  with  the 
theory,  implementation,  and  application  of  combined  optical  and  electronic  digital  signal 
processing  techniques.  Work  Unit  Number  5  is  directed  toward  problems  in  the  design  of 
VLSI  implementations  of  digital  signal  processing  systems. 

The  other  two  work  units  in  the  JSEP  program  are  concerned  with  electromagnetic 
measurements.  In  Work  Unit  Number  6,  Electromagnetic  Measurements  in  the  Time  and 
Freqi  ency-Domains,  research  is  concerned  with  both  theoretical  and  experimental 
investigations  of  the  use  of  time-domain  and  frequency-domain  methods  for  measuring  the 
characteristics  of  materials  and  electromagnetic  systems.  Work  Unit  Number  7,  Automated 
Measurements  for  Near-  and  Far-Field  Transformations,  is  concerned  with  assessing  the 
accuracy  of  computed  fields  on  the  surface  of  lossy  radomes  and  with  compensating  for 
probe  effects  when  near-field  measurements  are  made  on  spherical  and  arbitrary  surfaces. 

The  report  begins  with  the  Lab  Director’s  summary  of  the  accomplishments  during  the 
period  January  1,  1986  to  December  31,  1986.  Following  this  are  brief  reports  on  the 
individual  work  units.  These  reports  list  personnel  supported  and  discuss  in  general 
terms  the  research  that  was  carried  out  during  the  reporting  period.  Also  included  in 
each  work  unit  report  is  a  complete  list  of  publications  on  the  research  during  this 
period.  Complete  copies  of  these  publications  are  available  in  the  Annual  Report 
Appendix. 

II.  SIGNIFICANT  RESEARCH  ACCOMPLISHMENTS 

The  following  accomplishments  are,  in  the  judgement  of  the  laboratory  directors,  of 
particular  significance  and  potential  and  arc  therefore  worthy  of  special  mention. 

2. 1  Experimental  Demonstration  of  Parallel  Logic  Operations 

A  major  achievement  of  the  continuing  research  in  Work  Unit  Number  3  was  the 
experimental  demonstration  of  both  parallel  EXCLUSIVE  OR  and  parallel  NAND  logic 
operations  to  achieve  content  addressability  and  thus  digital  parallel  truth-table  look-up 
processing.  This  work  was  published  in  the  invited  paper  in  Optical  Engineering. 


This  result  provides  practical  experimental  demonstrations  of  optical  digital 
parallel  processing  using  array  logic.  It  is  highly  significant  since  it  demonstrates 
that  large  scale  parallel  digital  processors  can  be  implemented  optically.  Potential 
applications  include  remote  sensing,  air  traffic  control,  synthetic  aperture  radar 
imaging,  missile  guidance,  and  adaptive  antenna  array  beamforming. 

2.2  A  Unified  Theory  of  Translation-Invariant  Image  Processing  Systems 

A  major  achievement  of  the  research  in  Work  Unit  Number  1  is  the  development  of  a  new 
theory  for  representing  images  and  image  processing  systems.  Maragos,  in  his  Ph.D. 
thesis,  has  developed  a  new  theory  of  translation-invariant  systems  in  which  both  signals 
and  systems  are  fundamentally  represented  by  sets  rather  than  by  functions.  This  leads  to 
a  theory  of  signals  and  systems  in  which  geometric  structure  is  prominent.  The  theory  has 
already  been  applied  to  gain  new  insight  into  the  properties  and  implementation  of  many 
common  image  transformations  and  it  potentially  can  serve  as  the  basis  for  the  design  and 
implementation  of  new  image  transformations  specifically  directed  toward  enhancing, 
detecting,  and  coding  of  geometric  structure  in  images. 

2.3  Monostatic  Near-Field  Radar  Cross-section  Measurement 

A  monostatic  near-field  radar  cross-section  measurement  of  a  simple  target  (a  square 
flat  plate)  was  performed  on  a  planar  surface  near-field  measurement  system.  The  far- 
field  monostatic  radar  cross-section  was  correctly  determined  from  the  near-field 
measurements.  This  demonstration  was  an  important  step  in  the  process- of  developing  the 
monostatic  near-field  radar  cross-section  technique.  Should  this  technique  become  fully 
developed,  the  monostatic  scattering  of  large  (full  scale)  targets  could  be  accurately 
measured.  The  near-field,  intermediate  field  and  far-field  scattering  properties  arc 
determined  in  one  measurement  set.  Radar  ananoloies  such  as  glint  can  be  accurately 
predicted  from  these  measurements. 


WORK  UNIT  1 


TITLE: 

Multidimensional  Digital  Signal  Processing 
SENIOR  PRINCIPAL  INVESTIGATOR: 

R.  W.  Schafer,  Regents’  Professor 
SCIENTIFIC  PERSONNEL: 

R.  M.  Mersereau,  Professor 

M.  H.  Hayes,  Associate  Professor 

C.  Au  Yeung,  Graduate  Research  Assistant  (Ph.D.  Candidate) 

J.  E.  Bevington,  Graduate  Research  Assistant  (Ph.D.  Candidate) 

D.  Y.  Suh,  Graduate  Research  Assistant  (Ph.D.  Candidate) 

L.  Hertz,  Graduate  Research  Assistant  (Ph.D.  Candidate) 

C.  H.  Richardson,  Graduate  Research  Assistant  (Ph.D.  Candidate) 

SCIENTIFIC  OBJECTIVE: 

The  long  term  scientific  objective  of  this  research  is  to  understand  the  means  by 
which  multidimensional  signals  such  as  images  should  be  modelled  and  represented  to 
facilitate  the  encoding,  enhancement  and  automatic  extraction  of  information  from  such 
signals,  and  to  develop,  analyze  and  extend  computer  algorithms  for  these  purposes. 

RESEARCH  ACCOMPLISHMENTS: 

A.  Image  Segmentation  by  Texture  ( Bevington ,  Mersereau ) 

This  work  has  evolved  from  its  initial  exploration  of  the  use  of  linear  prediction 
coefficients  as  features  for  texture  characterization  to  the  development  of  a  program  for 
general  purpose  image  segmentation.  A  split/merge  algorithm  has  been  developed  which 
makes  use  of  the  maximum  likelihood  edge  detector  developed  earlier  along  with  simpler 
boundary  detectors.  This  work  has  led  to  a  statistical  characterization  of  a  number  of 
edge  detectors  and  an  analysis  of  their  performance  on  a  variety  of  different  edge  types  - 
points,  grey  level  discontinuities,  texture  boundaries,  cracks,  ridges,  and  valleys.  This 
study  is  continuing.  The  image  segmentation  program  is  being  applied  to  the  problem  of 
low  bit-rate  image  coding. 

D.  Constrained  Signal  Estimation 

This  project  represents  a  continuation  of  a  long  standing  project  on  iterative 
procedures  for  deconvolution  of  blurred  images.  This  particular  phase  of  the  research  is 
concerned  with  the  development  of  techniques  for  iterative  deconvolution  using  a  maximum 
entropy  criterion.  Theoretical  extensions  from  the  power  spectrum  estimation  problem  have 
been  developed  which  allow  the  results  of  that  problem  to  be  applied  to  deblurring.  Four 
procedures  for  finding  the  maximum  entropy  solution  have  been  developed  and  have  been 
applied  to  the  removal  of  lowpass  blurs  and  also  to  the  problem  of  reconstructing  a 
multidimensional  signal  from  its  projections.  Conditions  on  the  existence  of  a  feasible 
solution  have  been  proven  and  the  theory  is  currently  being  advanced  to  make  the  procedure 
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more  robust  in  the  presence  of  measurement  errors.  Current  efforts  are  also  being 
directed  toward  the  issue  of  reducing  the  complexity  of  the  resulting  algorithms. 

C.  Low  Bit-Rate  Encoding  of  Binary  Video  Signals 

Research  has  begun  on  a  project  to  efficiently  encode  two-level  (binary)  image 
sequences.  Currently  this  work  has  not  gone  beyond  the  stage  of  recreating  previously 
available  results  but  it  is  expected  that  this  project  will  make  extensive  use  of  ideas 
from  mathematical  morphology  which  were  developed  earlier. 

D.  Quadratically  Convergent  Deconvolution  Algorithms 

Iterative  algorithms  based  on  the  method  of  successive  approximations  have  become 
very  popular  for  signal  deconvolution  due  to  the  flexibility  that  they  allow  for  the 
incorporation  of  signal  constraints  into  the  restoration.  One  of  the  limitations  of  these 
iterative  algorithms  is  that  they  only  achieve  a  linear  rate  of  convergence.  With  a 
slight  modification,  however,  it  was  shown  that  it  is  possible  to  achieve  a  quadratically 
convergent  deconvolution  algorithm.  In  effect,  this  modification  corresponds  to  a 
reinitialization  of  the  deconvolution  algorithm  with  a  new  observation  equation  at  each 
iteration.  It  was  shown  that  the  corresponding  distortion  operator  hk  converges 
quadratically  to  an  impulse  and,  as  a  result,  the  restoration  xk  converges  quadratically 
to  x.  Therefore,  when  the  standard  iteration  requires  k  iterations  to  achieve  a  given 
reduction  in  the  mean  square  error,  the  modified  iteration  will  require  only  log2(Jt) 
iterations. 

E.  Signal  Modeling  and  Power  Spectrum  Estimation 

Power  spectrum  estimation  is  a  special  form  of  signal  reconstruction  problem  where 
one  wants  to  determine  the  autocorrelation  function  or  its  Fourier  transform  from  a  finite 
time  observation  of  a  time  series  or  from  a  noisy  and  truncated  autocorrelation  sequence. 
We  have  considered  two  problems  related  to  signal  modeling  and  the  application  of  these 
models  to  spectrum  estimation.  The  first  problem  is  concerned  with  the  modeling  of  a 
signal  as  the  sum  of  sinusoids  in  white  noise  where  the  sinusoidal  frequencies  are  varying 
as  a  function  of  time.  Typically,  with  such  a  model  an  adaptive  version  of  the  Pisarenko 
harmonic  decomposition  would  be  applied  to  the  data  to  extract  the  model  parameters.  By 
exploiting  some  properties  of  the  Levinson/Durbin  recursion,  however,  a  new  algorithm  was 
developed  for  estimating  the  model  parameters.  In  particular,  an  iterative  procedure  for 
finding  the  white  noise  power  was  proposed  which  uses  a  bisection  search  algorithm  which 

terminates  when  the  set  of  reflection  coefficients  corresponding  a  set  of  correlation 

values  satisfy  a  specific  set  of  conditions.  Once  the  white  noise  power  has  been 
determined,  the  sinusoid  frequencies  arc  extracted  by  rooting  the  polynomial  associated 
with  the  reflection  coefficients.  For  nonstationary  data,  this  algorithm  was  made  into  an 
adaptive  algorithm  where  both  the  white  noise  power  and  the  minimum  eigenvector 
(eigenfilter)  are  recursively  updated  in  time. 

The  second  modeling  problem  considered  was  concerned  with  developing  an 

autoregressive  moving  average  lattice  filter  model  for  a  linear  time-varying  system.  As  a 
result  of  this  work  a  new  ARMA  lattice  filter  structure  was  developed  which  is  consistent 
with  the  characteristics  of  the  well-known  autoregressive  and  moving  average  lattice 

filters.  In  particular,  this  ARMA  lattice  is  realized  in  terns  of  a  fully  orthogonal 
lattice  set  of  basis  vectors  and  it  evaluates  all  optimal  lattice  ARMA  filters  of  lower 
order.  In  addition  the  ARMA  lattice  structure,  a  fast  recursive  least  squares  algorithm 
for  the  evaluation  of  the  lattice  filter  coefficients  was  developed  [18]. 


F.  Theory  and  Application  of  Mathematical  Morphology 
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A  major  accomplishment  of  our  research  during  the  present  contract  is  the  development 
of  a  general  theory  of  translation-invariant  systems.  This  general  theory  significantly 
extends  the  theory  of  mathematical  morphology  and  it  unifies  a  wide  range  of  commonly  used 
image  processing  operations  under  a  common  framework.  Specifically  we  showed  that  any 
translation- invariant,  increasing,  system  can  be  represented  as  a  set  union  of  erosions  by 
all  structuring  elements  belonging  to  a  characteristic  set  of  elements  called  the  kernel 
of  the  system.  Although  the  kernel  of  a  system  may  generally  contain  an  infinite  number 
of  elements,  Maragos  showed  that  a  basis  or  smaller  sufficient  set  of  elements  exists  for 
a  broad  and  interesting  class  of  translation-invariant,  increasing,  systems.  It  was  also 
shown  that  this  theory  of  minimal  elements  applies  to  morphological  filters,  median  fil¬ 
ters,  order-statistics  filters,  edge  detectors,  shape  recognition  transformations  and  an 
interesting  class  of  linear  shift-invariant  systems.  The  theory  has  already  lead  to  new 
insights  into  the  properties  of  such  systems  and  also  to  new  approaches  to  the  implementa¬ 
tion  of  such  systems. 

The  general  theory  has  potential  to  be  applied  in  the  design  and  synthesis  of  image 
transformations.  Since  any  translation-invariant  increasing  system  can  be  represented  as 
a  union  of  erosions  by  its  kernel  elements,  new  systems  can  be  defined  by  specifying  the 
elements  of  the  kernel  basis.  It  is  not  known  how  to  choose  the  kernel  basis  so  as  to 
create  a  system  with  prescribed  desired  properties.  In  order  to  learn  how  to  do  this  we 
have  begun  to  study  some  of  the  systems  that  we  already  know  have  a  finite  kernel  basis, 
such  as  morphological  opening  and  closing,  median  filters,  order-statistics  filters,  and 
certain  shape  recognition  transformations. 

With  this  knowledge  as  a  base,  it  may  be  possible  to  determine  techniques  for 
designing  nonlinear  image  processing  systems  to  perform  specific  processing  functions  such 
as  noise  removel,  edge  detection,  and  shape  detection. 

So  far,  our  most  impressive  results  have  been  for  two-level  (binary)  images. 
However,  most  images  are  multi-level  (greytone)  images.  Graytone  images  can  be 
represented  by  a  collection  of  sets  which  specify  image  pixels  that  exceed  different 
thresholds.  Indeed,  if  enough  threshold  sets  are  used,  the  graytone  image  can  be 
represented  with  high  accuracy.  In  many  cases,  it  is  desirable  to  obtain  thresholdcd 
versions  of  an  image  with  only  a  few  different  thresholds.  An  example  is  in  representing 
images  of  integrated  circuits  or  printed  circuit  boards  where  the  image  consists  of 
regularly  shaped  geometric  structures  of  only  a  few  different  types  of  material. 
Obtaining  thresholdcd  images  that  retain  information  on  the  basic  geometric  structures  is 
a  nontrivial  task  due  to  uneven  lighting  and  focussing  problems. 

A  new  approach  to  multi-level  thresholding  has  been  developed.  In  this  method,  the 
threshold  is  adapted  so  as  to  insure  that  edges  in  the  thresholded  image  arc  aligned  with 
the  edges  of  the  original  graytone  image.  In  addition,  thresholds  arc  adapted  spatially 
to  account  for  non-uniform  lighting,  shadows,  etc. 

Research  is  now  being  directed  at  combining  of  contextual  and  other  symbolic 
knowledge  with  the  morphological  transformations  to  obtain  automatic  algorithms  for  a 
variety  of  image  analysis  problems. 
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SCIENTIFIC  OBJECTIVES 

The  primary  objective  of  this  proposed  work  unit  is  to  develop  systematic  techniques 
for  the  automatic  generation  of  provably  optimal  multiprocessor  implementations  for  a 
broad  class  digital  signal  processing  (DSP)  algorithms  and  for  a  broad  class  of 
multiprocessors  systems.  Stated  in  another  fashion,  the  goal  of  this  research  is  to 
develop  DSP  "compilers,"  where  the  input  is  an  algorithm  specification  and  the  output  is  a 
complete,  optimal  multiprocessor  implementation. 

A  basic  philosophy  of  this  research  has  always  been  to  perform  the  theoretical 
developments  in  the  context  of  an  actual  multiprocessor  system.  The  first  system  used  for 
this  purpose,  an  eight  processor  LSI-11/2  system  [7-8j,  is  now  obsolete.  For  the  future, 
the  proposed  research  will  be  tested  using  a  personal  computer  (PC)  based  multiprocessor 
utilizing  AT&T  DSP-32  processors. 

BACKGROUND: 

Over  the  past  decade,  the  rapid  development  of  digital  integrated  circuit  techniques 
has  caused  increased  interest  in  the  area  of  parallel  architectures  for  digital  signal 
processing.  Nowhere  is  this  more  evident  than  in  multidimensional  DSP  algorithms,  where 
the  computational  dimensions  of  the  algorithms  make  serial  processing  unreasonable  for 
virtually  all  real-time  applications. 

In  any  DSP  oriented  implementation,  there  will  inevitably  be  a  mix  of  both  intrinsi¬ 
cally  serial  and  intrinsically  parallel  tasks  which  must  be  realized.  Digital  signal 
processing  algorithms,  as  a  group,  are  unique  in  that  they  arc  typically  both  highly 
computationally  intense  and  also  have  a  high  degree  of  internal  structure.  For  the  pur¬ 
poses  of  this  research,  a  DSP  implementation  is  considered  to  be  a  realization  which  is 
dominated  by  its  signal  processing  functions,  and  in  which  its  non-signal-proccssing  tasks 
(control,  user  interface,  decision  making,  etc.)  play  a  relatively  minor  role. 

A.  Algorithm  Descriptions 

For  most  of  the  past  research  from  this  work  unit,  the  DSP  algorithms  have  been 
specified  as  shift- invariant  flow  graphs  (SIFG)  [10].  A  SIFG  is  defined  as  a  directed 


graph  in  which  all  operations  are  specified  at  the  nodes,  and  in  which  the  branches  are 
directed  paths  which  specify  the  flow  of  data  between  nodes.  The  shift-invariant  con¬ 
straint  requires  that  shifting  the  (set  of)  input  sequences  results  only  in  a 
corresponding  shift  in  the  (set  of)  output  sequences. 

SIFG’s  are  capable  of  representing  a  very  large  class  of  interesting  DSP  algorithms. 
Of  course,  they  can  easily  represent  all  those  systems  which  are  representable  by  signal 
flow  graphs.  Hence,  they  can  represent  a  large  class  of  linear  and  non-linear  as  well  as 
time-varying  systems.  In  particular,  they  are  capable  of  representing  digital  filters, 
DFT  structures,  FFT  structures  (indeed  ail  fast  discrete  transform  algorithms),  correla¬ 
tion,  convolution,  homomorphic  analysis,  matched  filtering,  linear  predictive  analysis, 
LMS  adaptive  filter  structures,  direct  form  recursive  least  squares,  and  lattice  form 
least  squares,  to  name  just  a  few.  In  addition,  SIFG  can  also  represent  many  algorithms 
which  are  not  typically  DSP  algorithms,  including  a  very  large  class  of  matrix  operations 
as  well  as  algorithms  specified  in  terms  of  low  level  logic  operations  (e.g.  a  digital 
multiplier  structure).  In  brief,  SIFG’s  can  represent  all  algorithms  which  do  not  include 
any  data  dependent  branch  operations.  Qearly,  the  vast  majority  of  DSP  algorithms,  as 
well  as  other  equivalent  algorithms,  belong  to  this  class. 

A  fully  specified  flow  graph  (FSFG)  is  a  SIFG  in  which  the  nodal  operations  are 
additionally  constrained  to  be  the  atomic  (kernel)  operations  of  the  underlying  consti¬ 
tuent  processors  which  are  to  be  used  in  the  realization.  Thus  the  atomic  operations 
represent  the  smallest  granularity  at  which  possible  parallelism  may  be  exploited  (10). 
The  name  generic  flow  graph  is  used  to  distinguish  between  those  SIFG’s  that  arc  also 
FSFG’s  and  those  SIFG’s  whose  nodes  are  not  all  atomic  operations. 

B  Fully  Specified  Flow  Graph  Bounds 

The  distinguishing  characteristic  of  the  previous  research  is  the  underlying  use  of 
bounds  on  the  optimal  performance  of  an  algorithm  in  the  generation  of  the  multiprocessor 
implementations.  These  bounds  are  characteristics  of  the  graph  defining  the  algorithm, 
not  of  the  mechanism  or  type  of  realization.  In  all,  three  different  bounds  are  used:  the 
iteration  period  bound  (previously  called  the  sample  period  bound  (10-12|),  which  is  the 
minimum  possible  time  between  iterations  of  the  algorithm;  the  delay  bound,  which  is  the 
minimum  time  between  the  availability  of  an  input  (set)  and  the  availability  of  the 
corresponding  output  (set);  and  the  processor  bound,  which  is  the  minimum  number  of 
processors  required  to  achieve  the  specified  iteration  period.  Implementations  which 
achieve  the  iteration  period  bound  are  said  to  be  rate-optimal.  Implementations  which 
achieve  the  delay  bound  are  called  delay-optimal.  Implementations  which  use  the  minimum 
possible  number  of  processors  for  a  specified  iteration  period  are  said  to  be  processor 
optimal. 

The  concept  of  bounds  on  the  multiprocessor  realization  of  flow  graphs  was  first 
introduced  in  the  context  of  signal  flow  graphs  by  Fettweis  (13)  and  later  extended  by 
Rcnfors  and  Neuvo  ( 14]  and  Schwartz  and  Barnwell  1 10- 12|.  The  basic  assumptions  are  that 
the  algorithm  to  be  implemented  is  represented  as  a  FSFG,  and  that  the  algorithm  is  to  tv 
implemented  on  a  synchronous  multiprocessor.  It  is  also  assumed  that  the  computation 
times  for  all  the  nodal  operations  arc  known.  TTiis  condition  is  easily  met  in  a 
homogeneous  multiprocessor  in  which  all  the  processing  elements  are  the  same.  Nothing  is 
assumed  about  the  communications  structure,  I/O  constraints,  or  the  details  of  the  reali¬ 
zation.  Hence,  the  resulting  bounds  reflect  the  absolute  limits  on  computation  rate  and 
delay  based  on  the  atomic  structure  of  the  constituent  processors. 
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Of  course,  many  things  besides  the  structure  of  the  algorithm  and  the  fundamental 
operational  capability  of  the  processors  may  limit  DSP  implementations.  Clearly,  things 
such  as  I/O  bandwidths,  external  resource  availability,  the  number  of  available  proces¬ 
sors,  and  the  communications  architecture  may  impact  the  achievable  rate,  delay,  and 
processor  efficiency  of  an  algorithm.  But  in  their  own  way,  each  of  these  aspects  can  be 
addressed  and  corrected.  For  a  particular  multiprocessor  system  and  a  particular  FSFG, 
the  above  bounds  are  fundamental.  Hence,  if  total  implementations  can  be  developed  which 
achieve  these  bounds,  then  it  is  clear  that  no  other  implementations  exist  which  can 
operate  at  a  higher  rate,  with  less  delay,  or  with  higher  efficiency. 

C.  Systolic  Processors 

The  area  of  systolic  processors  is  currently  attracting  the  attention  of  many 
researchers  in  parallel  processing  for  scientific  and  signal  processing  tasks.  The  term 
systolic  is  assumed  to  mean  a  regular  iterated  array  of  computational  elements  with  strict 
nearest  neighbor  communications.  There  is  a  global  clock,  and  on  every  clock  cycle  each 
processor  inputs  data,  operates  on  the  data  and  outputs  data  at  the  end  of  the  cycle.  No 
information  can  flow  further  than  one  processor  in  one  system  clock  cycle.  Systolic 
processors  are  a  special  case  of  a  static  pipeline.  Until  recently  the  area  was  dominated 
by  systolic  solutions  to  specific  problems,  presented  without  verification  or  derivation. 
However,  a  number  of  systematic  and  rigorous  approaches  to  systolic  implementations  have 
now  appeared,  including  one  from  this  research  program  [10-11]. 

Systolic  processors  are  attractive  because  they  appear  to  be  well  matched  to  many  of 
the  constraints  implied  by  VLSI  implementations,  and  more  importantly  because  solutions 
are  relatively  easy  to  find.  The  problem  is  that  systolic  implementations  are  often  over- 
constrained  (global  clock  tick,  static  pipeline  configuration,  etc.),  resulting  in  solu¬ 
tions  which  have  low  processor  utilization  and  which  cannot  be  rate-optimal  or  delay- 
optimal.  Much  of  the  research  already  performed  as  part  of  this  work  unit  can  be  viewed 
as  searching  for  implementations  which  overcome  the  fundamental  limitations  of  systolic 
processors. 

D.  SSIMD  Implementations 

Historically,  Skewed  Single  Instruction  Multiple  Data  (SSIMD)  implementations  were 
the  first  class  of  solutions  which  could  overcome  the  systolic  constraints  and 
consistently  achieve  rate-optimal,  processor- optimal  implementations  with  nearest  neighbor 
communications  for  a  large  class  of  interesting  algorithms  [1,15-19].  In  SSIMD,  exactly 
the  same  program  is  executed  on  each  of  the  processors  in  the  multiprocessor,  and  that 
program  realizes  exactly  one  iteration  of  the  flow  graph.  In  an  SSIMD  program,  all  of  the 
arithmetic  operations  appear  as  explicit  instructions,  but  the  delay  nodes  are  transformed 
into  input-output  pairs.  In  this  way,  the  delay  structure  in  the  flow  graph  becomes  the 
communications  structure  in  the  SSIMD  realization  [1,15-19]. 

For  any  given  program  and  any  given  constituent  processor,  it  is  possible  to  compute 
a  sampling  period  bound  for  the  SSIMD  realization  [1].  This  SSIMD  bound  for  programs  is 
equivalent  to  the  iteration  period  bound  for  fully  specified  flow  graphs.  Hence,  if  a 
program  can  be  generated  such  that  the  SSIMD  bound  is  equal  to  the  iteration  period  bound, 
then  the  SSIMD  realization  is  rate-optimal. 

The  SSIMD  approach  to  flow  graph  realizations  is  very  attractive  for  many  reasons. 
First,  for  all  SSIMD  realizations  in  which  the  number  of  processors  is  less  than  the 
processor  bound,  the  implementations  are  perfectly  efficient  and  the  use  of  N  processors 


always  increases  the  throughput  by  a  factor  of  exactly  N.  Second,  when  the  SSIMD  itera¬ 
tion  period  bound  is  equal  to  the  iteration  period  bound,  as  is  the  case  for  many  recur¬ 
sive  digital  filter  structures,  then  there  exists  no  multiprocessor  solution  using  the 
same  constituent  processor  which  is  faster  or  more  efficient.  Third,  although  the  sample 
period  bound  concept  is  not  involved,  SSIMD  realizations  work  equally  well  for  non¬ 
recursive  structures.  A  significant  and  unique  aspect  of  SSIMD  solutions  is  that  the 
program  for  a  one  processor  solution  is  identical  to  the  program  for  a  P  processor  solu¬ 
tion.  The  only  difference  is  in  the  interprocessor  communications  connections.  Finally, 
and  most  importantly,  the  all-important  communications  architecture  for  the  final 
implementation  is  completely  specified  by  the  delay  node  structure  of  the  flow  graph.  In 
particular,  by  constraining  all  the  delay  nodes  to  be  first  order,  all  single-time-index 
(one-dimensional)  SSIMD  solutions  can  be  realized  with  a  nearest-neighbor  unidirectional 
ring.  A  similar  result  applies  to  two-dimensional  flow  graphs,  [1,15-19].  However,  if  a 
more  complex  communications  mechanism  is  available,  then  the  flow  graph  can  be  defined  to 
take  advantage  of  it  [1]. 

At  the  beginning  of  the  current  proposal  period  (1984),  an  SSIMD  compiler  had  been 
demonstrated  [5]  which  generates  full  multiprocessor  implementations  for  the  laboratory 
multiprocessor  [7-8].  This  compiler  finds  a  rate-optimal  SSIMD  implementation,  if  it 
exists,  and  the  best  SSIMD  implementation,  if  it  does  not.  SSIMD  implementations  are 
always  processor-optimal  and  communications-optimal  (if  they  contain  only  first  order 
delays).  Two  important  points  should  be  made  concerning  this  SSIMD  compiler.  First,  its 
application  is  by  no  means  limited  to  the  laboratory  multiprocessor  around  which  it  was 
developed,  and  it  can  quite  easily  be  used  in  top-down  design  systems  using  microproces¬ 
sors,  signal  processing  chips,  or  VLSI  realizations.  Second,  and  more  important,  is  the 
result  that  if  a  rate-optimal  SSIMD  solution  exists,  it  is  very  easy  to  find.  The  infor¬ 
mation  available  from  the  computation  of  the  flow  graph  bounds  defines  so  precisely  the 
character  of  a  rate-optimal  solution  that  it  is  very  simple  to  test  whether  an  optimal 
SSIMD  solution  exists  and  to  find  it  if  it  does.  In  contrast,  finding  the  best  sub-optimal 
solution  is  much  more  computationally  intense.  Hence  we  have  the  paradox  that  the  most 
desirable  optimal  solutions  are  the  easiest  to  find,  but  they  may  not  always  exist. 

E.  Hardware  Support 

From  the  very  beginning,  this  work  unit  has  maintained  a  philosophy  of  combining  the 
theoretical  developments  with  actual  multiprocessor  hardware  for  validation  and  innova¬ 
tion.  Up  until  recently,  the  hardware  system  used  was  the  laboratory  multiprocessor 
system  based  on  ten  DEC  LSI-11/2  microprocessors  [7-8].  This  system  has  proved  very 
useful  in  the  past,  but  is  now  really  obsolete. 

Based  primarily  on  research  results  from  this  work  unit,  DARPA,  in  1985,  awarded 
Georgia  Tech  a  two  year  project  to  build  a  small  scale  prototype  of  a  DSP  supercomputer. 
The  system  is  called  the  Optimal  Synchronous  Cyclo-static  Array,  or  OSCAR,  and  a  sixteen 
processor  prototype  system  was  scheduled  for  completion  in  August,  1987.  The  OSCAR  system 
was  being  built  under  the  direction  of  C.  J.  M.  Hodges,  the  same  person  who  built  the 
previous  laboratory  microprocessor  system.  Unfortunately,  due  to  internal  funding 
problems  at  DARPA,  this  project  has  been  discontinued,  and  the  OSCAR  will  not  be 
completed. 

Another  component  of  the  hardware  environment  which  impacts  this  work  unit  is  the 
development  of  DSP  microprocessors  like  the  TI  TMS320  family,  and  the  AT&T  DSP  family.  At 
the  current  time,  the  Digital  Signal  Processing  Laboratory  at  Georgia  Tech  is  making  heavy 
use  of  the  'll  TMS32010  processor  using  a  board  which  plugs  into  an  IBM  PC  [20],  Since 


multiple  boards  can  be  plugged  into  the  same  system,  fairly  powerful  multiprocessors  can 
be  configured,  and  these  have  been  used  for  some  applications  [21]  and  in  a  graduate  level 
course  in  multiprocessing.  Although  the  existing  boards  do  not  have  the  synchronous 
features  necessary  to  support  the  output  of  the  multiprocessor  compilers,  they  can  be 
easily  modified  to  include  these  features.  In  addition,  the  design  and  construction  of 

new  boards  in  this  class  is  a  relatively  simple  task.  Because  of  the  simplicity  of 

constructing  multiprocessors  in  this  way,  and  because  of  the  extremely  good 
cost/performance  of  these  machines,  such  small  scale  multiprocessors  must  become 
increasingly  important  in  the  future. 

RESEARCH  ACCOMPLISHMENTS: 

Despite  the  catastrophic  impact  of  the  cancellation  of  the  OSCAR  project  by  DARPA, 
the  research  results  from  this  work  unit  during  the  year  have  been  quite  good. 

Work  on  the  area  of  multiprocessor  architectures  for  DSP  since  the  last  major  propo¬ 
sal  has  centered  in  six  areas:  the  generation  of  cyclo-static  implementations  for  fully 
specified  flow  graphs;  the  generation  of  systolic  implementations  for  fully  specified 
flow  graphs;  efficient  SSIMD  compilation  procedures;  the  generation  of  fully  specified 

flow  graphs  from  serial  specifications;  the  use  of  blocked  filters  for  increased 

multiprocessor  throughput;  and  analysis/reconstruction  filter  bank  theory.  These  topics 
are  largely  included  in  three  theses  [4,22,23],  two  of  which  are  complete  [4,22]  and  one 
of  which  is  near  completion  [23]. 

A. I  Cyclo- static  Implementations 

By  far  the  most  important  research  result  of  this  period  was  the  creation  of  a  cyclo¬ 
static  compiler  for  fully  specified  flow  graphs  [3-4],  This  work  is  part  of  the  thesis 
research  of  David  Schwartz  [4], 

The  development  of  the  cyclo-static  compiler  really  grew  out  of  an  attempt  to  under¬ 
stand  why  SSIMD  and  PSSIMD  implementations  could  be  rate-optimal,  processor-optimal,  and 
communications-optimal  while  the  systolic  implementations  for  the  same  algorithms  could 
not.  There  are  really  two  separate  reasons  for  the  shortcomings  of  systolic  arrays.  The 
first  is  the  fact  that  systolic  processors  are  static  pipelines.  This  means  that  any 
particular  operation  in  an  algorithm  is  assigned  to  a  particular  processor  in  the  systolic 
array,  and  that  operation  is  performed  by  that  processor  on  every  iteration.  Hence,  the 
operations  are  static  and  only  the  data  moves  through  the  multiprocessor.  In  contrast, 
SSIMD,  PSSIMD,  and  cyclo-static  processors  are  dynamic  pipelines  in  which  both  the  opera¬ 
tions  and  the  data  move  through  the  multiprocessor.  The  second  reason  is  the  effect  of 
the  global  transfer  clock.  Indeed,  this  global  transfer  clock  was  the  basic 

characteristic  for  which  systolic  arrays  were  named,  giving  the  whole  system  it’s 
"pumping"  action.  There  is  no  fundamental  requirement  that  all  the  pipeline  registers  in 
the  system  be  clocked  simultaneously.  There  is  also  no  reason  that  each  processor  must 
perform  I/O  on  every  clock  cycle.  In  contrast,  the  input-output  operations  in  SSIMD, 
PSSIMD,  and  cyclo-static  implementations  move  in  parallel,  non-overlapping  wavefronts 
(with  a  periodic  pattern  to  the  spacing  of  successive  wavefronts). 


Cyclo-static  implementations  are  a  new  class  of  multiprocessor  solutions  which  can  be 
used  for  the  optimal  realization  of  iterative  or  recursive  algorithms  on  synchronous 
multiprocessors.  A  processor  which  realizes  cyclo-static  implementations  can  be  considered 
to  be  a  generalization  of  systolic  processors,  wavefront  array  processors  and  SSIMD 
processors.  Cyclo-static  realizations  differ  in  that  they  are  provably  optimal  with 
respect  to  multiple  criteria. 

Cyclo-static  implementations  are  a  broad  class  of  deterministically  scheduled  syn¬ 
chronous  MIMD  schedules  which  include  systolic  and  SSIMD  implementations  as  special  cases. 
As  previously  noted,  the  primary  feature  of  a  cyclo-static  implementation  which 
distinguishes  it  from  a  systolic  implementation  is  that,  viewed  from  the  reference  point 
of  a  single  iteration  of  the  algorithm,  a  systolic  implementation  is  static  where  as  a 
cyclo-static  implementation  is  dynamic.  In  overly  simplistic  terms,  if  a  systolic 

implementation  can  be  considered  to  be  an  array  of  processors  in  which  the  instructions 
are  fixed  in  space  and  the  data  travels  through  space,  then  a  cyclo-static  implementation 
is  one  in  which  both  the  instructions  and  data  travel  through  space.  This  extra  degree  of 
freedom  in  a  cyclo-static  implementation  is  very  important  in  generating  optimal  realiza¬ 
tions.  As  its  name  implies,  if  viewed  from  points  in  space-time  separated  by  an 

appropriate  period,  cyclo-static  implementations  can  be  considered  static. 

Cyclo-static  solutions  can  be  effectively  found  that  achieve  a  subset  of  the 

following  optimallity  criteria:  rate  optimal  (maximally  parallel,  minimum  iteration 
period),  processor  optimal  (maximum  processor  efficiency),  delay  optimal  (minimum 

throughput  delay)  and  adjacent  communications  optimal  (adjacent  processor  communications 
only).  The  procedure  for  finding  solutions  is  a  combinatorial  optimization  method  which 
is  efficient  for  typical  realizations  that  are  rate  optimal.  This  problem  was  previously 
considered  computationally  intractable.  In  particular,  previous  researchers  who 
considered  optimal  deterministic  scheduling  of  flow  graphs  took  the  approach  of 

transforming  the  original  directed  cyclic  (containing  loops)  flow  graph  to  a  directed 
acyclic  graph  (loop  free).  Unfortunately  the  optimal  solution  to  the  acyclic  graph  can 

only  fully  exploit  the  parallelism  of  the  original  graph  for  a  few  special  cases.  The 

original  aspects  of  this  work  are  the  direct  utilization  of  the  original  cyclical  graph, 
the  concept  of  applying  graph  bounds  as  part  of  the  multiprocessor  compilations  procedures 
and  the  introduction  of  a  new  approach  to  periodic  scheduling. 

A. 2.  The  Generation  of  Systolic  Implementations  from  FSFGs 

Although  systolic  algorithms  are  not  the  primary  topic  of  this  research,  many  of  the 
same  formalisms  and  techniques  which  can  be  used  to  automatically  generate  cyclo-static 
implementations  may  also  be  used  to  generate  systolic  implementations.  One  of  the  impor¬ 
tant  outputs  of  this  research  was  a  general  method  for  the  transformation  of  algorithms 
describable  by  SIFGs  into  equivalent  systolic  realizations  [10-11].  Over  the  past  several 
years,  there  has  been  considerable  interest  in  systolic  implementations,  and  a  fairly 
large  number  of  systolic  algorithms  have  been  published.  For  the  most  part,  these  algo¬ 
rithms  have  not  been  derived  using  any  formal  method,  but  rather  each  has  been  developed 
and  presented  separately,  sometimes  without  extensive  verification.  The  primary  virtue  of 
the  method  developed  in  this  research  is  that  it  represents  a  rigorous  and  systematic 
procedure  which,  if  correctly  applied,  always  results  in  an  error-free  solution.  The 
research  showed  that  many  of  the  previously  published  algorithms  and  many  new  algorithms 
can  be  generated  by  this  single  set  of  procedures. 

The  class  of  algorithms  addressed  by  the  systolic  transformation  procedures  are  all 
those  procedures  which  can  be  described  by  shift-invariant  flow  graphs.  As  noted 


previously,  this  is  a  very  broad  class  of  algorithms  which  includes  not  only  all  recursive 
and  non-recursive  linear  shift  invariant  algorithms  describabie  by  signal  flow  graphs,  but 
also  matrix  algorithms,  systems  involving  decimation  and  interpolation,  bi-linear  systems, 
and  many  more. 

The  basis  for  the  systolic  transformation  method  are  all  graph  theoretic  techniques. 
The  method  is  generally  applied  in  three  steps.  First,  an  interleaving  (up-sampling) 
transformation  is  applied  if  necessary.  The  transformation  is  necessary  in  most  recursive 
systems  to  supply  the  delay  elements  which  constrain  data/information  flow  to  a  maximum 
distance  of  one  adjacent  processor  per  global  clock  tick  (non-broadcasting).  Second,  a 
"static-pipeline  iteration  period  bound"  is  computed  for  the  transformed  graph.  This  is 
equivalent  to  the  iteration  period  bound  with  the  further  constraint  that  the  implementa¬ 
tion  must  be  static.  Finally,  using  a  knowledge  of  the  static-pipeline  iteration  period 
bound,  a  set  of  delay  transformations  are  performed  using  either  a  cut-set  or  a  single 
source  maximum  (delay)  path  spanning  tree  approach.  The  final  resulting  graph  is  directly 
realizable  as  a  pipclined/systolic  realization.  Very  similar  results  have  just  been 
reported  by  Jover  [36J. 

Many  well  known  systolic  algorithms  have  odd  numbered  processors  operating  on  odd 
clock  cycles  and  even  numbered  processors  operating  on  even  clock  cycles.  For  these 
algorithms  it  is  possible  to  interleave  two  independent  data  streams  and  process  both 
simultaneously.  Using  these  techniques,  it  is  easy  to  show  that  these  properties  are  a 
consequence  of  the  two-way  interleave  transformation  required  by  step  one  for  most  recur¬ 
sive  and  minimum  delay  solutions. 

The  data  interleaving  is  a  consequence  of  the  communications  constraints.  For  many 
systems  there  is  only  one  data  stream  to  be  processed.  For  these  systems,  the  efficiency 
of  systolic  solutions  is  bounded  above  by  the  reciprocal  of  the  interleaving  factor.  This 
inefficiency  can  be  overcome  by  the  new  formal  technique  for  merging  operations  resulting 
in  a  near  systolic  realization  that  is  referred  to  as  merged  systolic  H. 

Using  these  graph  theoretic  techniques,  it  is  very  simple  to  derive  systolic  solu¬ 
tions.  Examples  of  FIR  filters,  HR  filters  of  direct  form  I  and  II,  lattice  filters, 
matrix-vector  multipliers,  and  lower  triangular  systems  solver  have  been  presented  (11). 
The  matrix-vector  multiply  is  an  extension  of  an  FIR  filter  and  the  triangular  system 
solver  is  an  extension  of  an  HR  filter. 

A. 3.  Efficient  SSIMD  Compilation  Procedures 

As  previously  noted  in  the  background  section,  one  of  the  primary  concerns  at  the 
beginning  of  this  research  period  was  the  relatively  slow  computation  times  when  the  SSIMD 
compiler  was  unable  to  find  rate  optimal  implementations.  The  basic  reason  for  this 
effect  was  that  when  optimal  implementations  existed,  the  optimallity  constraints  which 
could  be  applied  generally  resulted  in  an  efficient  compilation.  The  problem  was  that 
after  the  compiler  proved  that  no  optimal  solution  existed,  there  were  no  new  constraints 
which  could  be  applied  while  searching  for  a  sub-optimal  solution. 

As  part  of  the  Ph.D.  research  of  Sae  Hun  Lee,  a  new  SSIMD  compiler  has  been  implemen¬ 
ted  which  can  efficiently  compile  both  optimal  and  sub-optimal  SSIMD  implementations  (6). 
Like  the  previous  compiler,  the  new  SSIMD  compiler  takes  FSFGs  as  input,  and  gives  the 
best  possible  (fastest)  SSIMD  implementations  as  output.  However  unlike  the  previous 
compiler,  the  new  compiler  uses  linkcd-list  structures  to  represent  and  manipulate  the 
structure  of  the  FSFGs,  rather  than  matrix  structures. 


The  new  compiler  operates  by  ranking  the  loops  of  a  recursive  graph  in  order  of  their 
criticallity,  with  the  most  critical  loops  first.  The  loops  are  then  sequentially 

included  in  the  partial  solution  using  a  restartable  topological  sorting  technique.  At 

each  stage,  the  partial  solution  is  tested  for  rate-optimallity.  If  at  any  stage  the 

procedure  leads  to  a  partial  solution  which  is  sub-optimal,  then  this  constitutes  a  proof 
that  no  optimal  solution  exists.  At  this  point,  the  compiler  backs  up  and  finds  the 
lowest  iteration  period  which  is  consistent  with  the  existing  partial  solution.  This  new 
iteration  period  is  used  as  the  basis  for  a  new  optimallity  definition,  with  a 

corresponding  set  of  new  ‘pseudo’  critical  loops,  and  the  search  continues.  The 

dynamically  defined  pseudo  critical  loops  and  SSIMD  iteration  period  bounds  provide  the 
constraints  necessary  to  limit  the  search  space,  and  reduce  the  complexity  of  the 

compiler. 

In  comparison  tests  between  the  new  and  the  old  compiler,  the  new  compiler  was  found 
to  operate  at  about  twice  the  rate  of  the  old  compiler  on  problems  which  have  a  rate- 

optimal  SSIMD  solution,  and  about  twenty  times  as  fast  on  problems  which  do  not.  Of 

course,  these  numbers  are  only  examples  because  the  running  times  are  highly  problem 
dependent. 

A. 4.  Multiprocessor  Compilation  from  Serial  Specifications 

Another  set  of  research  results,  largely  included  in  the  Ph.D.  thesis  of  work  Sae  Hun 
Lee,  is  the  presentation  of  a  set  of  techniques  for  the  generation  of  optimal  multiproces¬ 
sor  realizations  from  non-parallel  (serial)  algorithm  representations.  These  techniques, 
when  they  are  fully  implemented,  will  form  the  basis  of  a  system  which  takes  as  input  a 
serial  presentation  of  the  algorithm  to  be  implemented,  and  gives  as  output  an  optimal 
SSIMD,  static,  or  balanced  PSSIMD  solution  [6,23).  If  no  optimal  implementation  of  any  of 
these  classes  exists,  then  the  full  cyclo-static  compiler  described  above  can  be  used  to 
find  an  optimal  solution. 

The  entire  compilation  procedure  can  be  divided  into  four  separate  parts:  the  gene¬ 
ration  of  a  generic  flow  graph  (GFG)  from  a  serial  specification;  the  generation  of  a 
fully  specified  flow  graph  from  a  GFG;  the  generation  of  the  simplest  multiprocessor 
schedule  which  attains  rate-optimallity;  and  the  generation/specification  of  the  communi¬ 
cations  solution.  Although  these  four  separate  parts  were  developed  to  be  combined  into  a 
single  compiler,  each  is  of  interest  in  its  own  right. 

The  first  part  of  this  work  deals  with  the  problem  of  finding  the  best  GFG  which 
corresponds  to  a  particular  serially  specified  algorithm.  The  serial  representation  is  in 
the  form  of  a  set  of  FORTRAN  arithmetic  statements  and  DO  statements.  The  macro  nodes  in 
the  GFG  are  in  the  form  of  the  combination  of  any  number  of  the  same  operation.  The  best 
GFG  is  defined  as  the  one  with  the  smallest  number  of  nodes.  The  basic  procedure  is  to 
identify  the  infinite  loop  which  defines  the  algorithm,  and  to  "unwind”  all  of  the  inside 
DO  loops  to  form  the  graph. 

Once  a  GFG  is  generated,  the  second  part  of  the  compiler  converts  it  to  a  FSFG  using 
a  tree  balancing  technique  (6,23).  The  basic  procedure  here  is  to  define  a  generic  flow 
graph  iteration  period  hound,  which  is  a  lower  bound  on  the  iteration  period  bounds  of  all 
FSFGs  which  can  be  generated  from  the  GFG.  Using  the  bound  as  a  guide,  a  FSFG  with  the 
minimum  possible  iteration  period  bound  is  constructed. 
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The  research  has  taken  two  separate,  but  related,  approaches  to  the  last  two  parts 
of  the  compiler.  In  the  first  approach,  the  compiler  uses  an  extended  version  of  the 
SSIMD  compiler  discussed  in  section  A. 3  to  locate  the  simplest  optimal  schedule.  The 
compiler  first  tries  to  find  a  rate-optimal  SSIMD  implementation.  If  the  compiler  proves 
that  no  such  solution  exists,  instead  of  trying  to  find  the  best  sub-optimal  SSIMD  solu¬ 
tion,  as  before,  it  tries  to  find  a  rate-optimal  static  solution.  If  no  such  solution 
exists,  it  tries  to  find  balanced  PSSIMD  solutions  of  increasing  order  until  a  rate- 
optimal  solution  is  found.  If  no  such  solution  exists,  a  full  cyclo-static  solution  must 
be  used  [3-4).  Then,  after  the  rate-optimal  multiprocessor  schedule  is  identified,  the 
compiler  finds  the  minimum  pipelined  and  non-pipelined  communications  architecture  which 
can  support  the  solution. 

In  the  second  approach,  which  is  still  being  investigated,  the  communications  con¬ 
straints  are  combined  with  the  optimallity  constraints,  and  the  last  two  parts  of  the 
compiler  are  combined. 

A. 5.  Increasing  ilie  Parallelism  of  Filters 

Recurrence  relations,  such  as  recursive  filters,  specified  by  fully  specified  flow 
graphs  (FSFG),  have  a  maximum  parallelism  that  is  constrained  by  one  or  more  critical 
loops.  In  a  processor  optimal  implementation,  adding  additional  processors  beyond  the 
processor  bound  can  not  improve  performance.  However  it  is  possible  to  increase  the 
parallelism  of  the  problem  by  modifying  the  FSFG.  Two  methods  to  increase  the  parallelism 
of  recursive  filters  have  been  developed  [4]  [12].  The  first  approach  is  a  blocking 
method  which  is  based  on  the  block  state  variable  form  for  digital  filters.  For  the  block 
state  variable  form,  an  upper  and  lower  bound  on  the  minimum  possible  iteration  period 
{iteration  period  bound)  of  a  realization  was  developed.  In  addition,  the  associated 
number  of  processors  required  to  achieve  a  given  iteration  period  was  determined.  It  was 
also  shown  that  for  many  problems  the  blocked  form  has  lower  computational  requirements 
and  decreased  finite  word  effects,  even  when  implemented  on  a  typical  sequential 
uniprocessor.  Similar,  and  related,  results  have  been  reported  by  Hui-Hung  and 
Mcsserschmitt  [24-25]. 

The  second  method,  the  "extended  Moyer’s  method,”  [26]  [4]  is  based  on  transforming 
the  transfer  function  of  a  filter  to  a  non-minimal  form.  Since  the  method  is  applied  to 
the  transfer  function,  the  method  is  independent  of  the  form  of  the  filter.  An  equivalent, 
nor^-minimal,  transfer  function  can  be  constructed  that  contains  only  delays  of  order  L, 
(z  ),  in  the  loops.  If  the  minimal  form  realization  had  a  critical  loop  that  contained  k 
delays,  the  non-minimal  form  contains  kL  delays,  and  therefore  the  iteration  period  bound 
i^»  factor  of  L  smaller.  This  method  was  suggested  by  Betlanger  [26]  for  cases  where  L  = 
2  ,  and  extended  to  the  general  integer  case  by  Moyer,  [26].  The  results  were  further 
extended  by  Schwartz  [4]  to  reformulate  the  solution  in  terms  of  real  coefficients  and 
analyzed  in  the  rate  bound  framework. 

2 

Both  methods  exhibit  a  cost  of  0(L  )  processors  to  achieve  the  maximum  processing 
rate  resulting  from  transforming  by  order  L.  The  block  state  variable  approach  asympto¬ 
tically  approaches  ideal  speedup,  while  the  extended  Moyer’s  method  always  achieves  ideal 
speedup.  However  the  block  state  variable  form  has  better  numerical  properties  than  the 
untransformed  system,  while  the  extended  Moyer’s  method  has  poor  numerical  properties  wiiti 
increasing  block  size.  While  neither  method  is  ideal,  depending  on  the  problem,  both  arc 
useful  for  small  transformation  orders. 


A. 6.  Time-Frequency  Representations 


The  final  area  of  research  was  analysis/reconstruction  systems  based  on  time- 
frequency  representations.  Such  systems  are  fundamental  components  of  many  signal 
analysis  and  signal  processing  systems.  This  work  is  largely  from  the  Ph.D.  thesis  of 
Mark  Smith  [22,27-30].  Mark  Smith  himself  was  supported  on  State  research  funds,  and  not 
directly  on  this  contract.  However,  his  thesis  advisor,  T.  P.  Barnwell  ID,  was  supported 
by  JSEP  for  his  part  in  this  work. 


There  are  two  main  contributions  which  came  out  of  this  portion  of  the  research.  The 
first  was  the  invention  of  the  conjugate  quadrature  filter  (often  referred  to  as  Smith- 
Bamwell  filters  by  other  authors).  These  filters  form  the  basis  of  the  first  ever 
maximally  decimated  two-band  analysis/reconstruction  systems  [28,30]  in  which  filters  of 
arbitrarily  good  quality  could  be  easily  designed.  These  filters  have  been  widely 
recognized  as  the  solution  to  a  long  standing  fundamental  problem  in  digital  signal 
processing.  Multiband  exactly  reconstructing  analysis/synthesis  systems  can  be  easily 
constructed  using  tree  structures  of  conjugate  quadrature  filters. 

The  second  major  contribution  in  this  area  is  the  introduction  of  the  alias-component 
matrix  (AC-matrix)  [28,31]  formulation  for  the  description  of  maximally  decimated 
analysis/reconstruction  systems.  This  formulation  is  based  on  the  general  filter  bank 
transform  (GFBT),  which  is  a  time-frequency  representation  which  is  similar  to,  but  more 
general  than,  the  short-time  Fourier  transform  (STFT).  The  primary  virtue  of  the  AC- 
matrix  formulation  is  that  it  allows  most  of  the  fundamental  issues  in  maximally  decimated 
time-frequency  systems  (filter  quality,  system  complexity,  frequency  distortion,  phase 
distortion,  and  aliasing)  to  be  addressed  separately  in  the  design  process.  The  effect  of 
the  AC-matrix  formulation  is  to  bring  all  previous  maximally  decimated 
analysis/reconstruction  systems  and  many  new  systems  under  the  umbrella  of  a  single 
formalism.  As  such,  all  these  related  systems  can  be  easily  compared,  and  new  systems  can 
be  easily  designed. 


During  the  last  half  of  1986,  this  work  has  been  expanded  by  a  new  Ph.D.  student 
named  Kambis  Nayebi  [37],  One  of  the  central  unsolved  problems  in  this  area  is  to  find 
techniques  for  design  of  high  quality  filter  banks  for  analysis/reconstruction  systems 
based  on  large  numbers  of  channels.  Nayebi  [37]  has  been  able  to  develop  a  set  of 
orthogonallity  constraints  between  the  coefficients  of  the  filters  in  the  filter  banks. 
This  represents  a  powerful  new  tool  for  the  realization  of  such  systems. 
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SCIENTIFIC  OBJECTIVE: 

The  long-term  scientific  objective  of  this  research  is  to  develop  broadly-based, 
theoretical  and  experimental  knowledge  of  two-dimensional  optical  information  processing 
including  algorithms,  architectures,  systems,  and  devices.  This  would  bring  together  a 
range  of  concepts  from  basic  physics  to  information  processing  in  its  most  generalized 
form.  Optical  systems  based  on  content-addressable  memory  processing,  associative 
processing,  Givens  rotations,  and  hyperbolic  rotations  are  being  analyzed  starting  from 
basic  physical  principles  and  extending  through  experimental  systems  performance. 


RESEARCH  ACCOMPLISHMENTS: 

A.  Complexity  of  Residue  Number  System  Processing 

An  analytic  expression  for  the  lower  bound  on  the  complexity  of  residue  addition  and 
multiplication  was  developed  Significant  reduction  of  the  the  required  stored  logic  in  a 
content-addressable  memory  is  noted.  Errors  in  these  results  as  published  by  C.  A. 
Papacliristou  were  corrected. 

This  research  was  published  in  IEEE  Transactions  on  Computers. 


D.  Multi-Level  Coded  Residue-Based  Content-Addressable-Memory  Optical  Computing 

The  extension  of  truth-table  look-up  processing  beyond  primitive  operations  (such  as 
addition)  to  higher-level  operations  (such  as  discrete  matched  filtering)  was  presented. 
Use  of  the  residue  system  and  logical  minimization  techniques  to  reduce  the  required 
number  of  reference  patterns  stored  in  a  content-addressable  memory  was  illustrated  for 


16-bit  full-precision  addition.  Multilevel  coding  of  the  numbers  was  introduced  as  a 
method  to  achieve  further  truth-table  reduction.  The  required  number  of  reference 
patterns  for  implementing  the  residue  addition  and  multiplication  operations  were  provided 
for  all  moduli  from  2  through  32  with  2-,  3-,  and  5-level  coding.  An  optical  holographic- 
implementation  of  a  system  that  processes  multilevel  coded  numbers  was  presented. 

This  research  was  published  in  Applied  Optics. 


C.  Truth-Table  Look-Up  Processing 

The  need  for  ultra-high-speed  computing  for  a  variety  of  modem  processing  problems 
has  generated  new  interest  in  using  truth-table  look-up  techniques.  Further,  due  to  the 
frequently  parallel  nature  of  these  processing  problems,  optical  systems  appear  to  be 
promising  for  these  applications.  The  basic  principles  of  truth-table  look-up  processing 
were  reviewed.  The  issues  of  number  representation,  multilevel  coding,  and  logical 
minimization  were  developed.  Example  fixed-radix  and  residue  number  representations  were 
given  with  and  without  multilevel  coding.  Logical  reduction  techniques  were  discussed 
with  examples.  A  comparison  of  the  number  of  truth-table  entries  needed  for  16-bit  full 
precision  addition  and  multiplication  were  given,  illustrating  the  advantage  of  the  multi¬ 
level  coded  residue  number  representation. 

These  results  were  published  in  an  invited  paper  in  Optical  Engineering. 


D.  Integrated-Optical  Givens  Rotation  Device 

The  Givens  rotation  operation  occupies  a  central  role  in  linear  algebraic  signal 
processing.  An  integrated-optical  coherent  implementation  of  an  elementary  rotation 
matrix  device,  based  on  thick  grating  diffraction,  to  perform  this  operation  was 
suggested.  It  was  shown  that  existing  electrooptic  phase  shifting  and  grating  diffraction 
devices  can  be  combined  to  produce  a  very  fast  Givens  rotation  device. 

This  research  was  published  in  Applied  Optics. 


E.  Logical  Minimization  of  Multilevel-Coded  Functions 

Discrete  numerical  values  in  digital  processing  systems  may  be  encoded  in  two-level 
(binary)  or  higher-level  (multilevel)  representations.  Multilevel  coding  can  produce 
smaller  and  more  efficient  processors.  In  truth-table  look-up  processing,  the  number  of 
entries  (reference  patterns)  can  be  reduced  using  multilevel  coding.  Since  parallel- 
input/parallel  output  optical  truth-table  look-up  processors  can  be  constructed  based  on 
holographic  content-addressable  memories,  it  is  essential  to  know  the  minimum  storage 
required  to  implement  various  functions.  A  new  simple  method  for  reducing  multivalued 
functions  was  developed.  This  method  is  based  on  an  extension  of  the  Quine-McCluskey 
minimization  method  used  for  binary  logic  functions.  This  minimization  method  was  then 
applied  to  the  truth  tables  representing  (1)  modified  signed-digit  addition,  (2)  residue 
addition,  and  (3)  residue  multiplication.  A  programmable  logic  array  gate  configuration 
for  the  modified  signed-digit  adder  was  presented. 

This  research  was  published  in  Applied  Optics. 


F.  Grating  Diffraction 
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A  rigorous  coupled-wave  analysis  for  metallic  surface-relief  gratings  was  presented. 
This  approach  allows  an  arbitrary  complex  permittivity  to  be  used  for  the  material  and 
thus  avoids  the  infinite  conductivity  (perfect  conductor)  approximation.  Both  TE  and  TM 
polarizations  and  arbitrary  angles  of  incidence  are  treated.  Diffraction  characteristics 
for  rectangular-groove  gold  gratings  with  equal  groove  and  ridge  widths  were  presented  for 
freespace  wavelengths  of  0.5,  1.0,  and  10.0  microns  for  all  diffracted  orders  as  a 
function  of  period,  groove  depth,  polarization,  and  angle  of  incidence.  Results  included 
the  following:  (1)  TM  polarization  diffraction  characteristics  vary  more  rapidly  than  do 
those  for  TE  polarization,  (2)  95%  first-order  diffraction  efficiency  occurs  for  TM 
polarization  at  10.0  microns,  (3)  <0.1%  zero-order  specular  reflectivity  occurs  for  both 
TE  and  TM  polarizations,  (4)  >50%  absorption  of  incident  power  occurs  at  0.5  micron,  and 
(5)  the  perfect-conductor  approximation  is  not  valid  for  TM  polarization  at  any  of  the 
wavelengths  and  is  not  valid  for  TE  polarization  at  0.5  micron. 

This  research  was  published  in  Journal  of  the  Optical  Society  of  America. 


G.  Analysis  Interdigitated  Electrodes  on  Electro-Optic  Waveguides 

Integrated  optical  interdigitated-electrode  devices  are  used  to  diffract  light  and  to 
launch  acoustic  waves.  The  electric  field  and  the  permittivity  and  strain  tensors  induced 

by  a  voltage  applied  to  periodic  interdigitated  electrodes  of  finite  thickness  on  the 

surface  of  an  anisotropic  electro-optic  crystalline  waveguide  were  calculated  rigorously. 

The  extremely  important  existence  of  a  buffer  layer  between  the  electrodes  and  the 

waveguide  occurring  in  practical  devices  was  included  in  the  analysis. 

This  work  has  been  accepted  for  publication  in  Journal  of  Lightwave  Technology. 
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//.  Zero-Reflectivity  Dielectric  Surface  Relief  Gratings 

Using  rigorous  coupled-wave  analysis,  high  spatial-frequency  rectangular-groove 
surface-relief  phase  gratings  were  shown  to  be  capable  of  exhibiting  zero  reflectivity. 
Thus  these  corrugated  surfaces  may  act  as  antirefiection  coatings  in  a  variety  of 
applications.  The  diffraction  characteristics  of  rectangular-groove  surface-relief 
gratings  were  presented  for  several  ratios  of  incident  wavelength  to  grating  period  as  a 
function  of  filling  factor,  groove  depth,  angle  of  incidence,  and  polarization.  The 
conditions  for  zero  reflectivity  were  identified.  Results  are  compared  to  single- 
homogeneous-layer  approximate  theory  results.  In  the  limit  of  long  wavelengths  for  an 
electromagnetic  wave  in  a  dielectric  of  refractive  index  n  normally  incident  upon  a 
dielectric  of  index  n  ,  it  was  determined  that  for  antireflection  behavior,  the  grating 
groove  depth  should  be  X/4(n  n  )l/2  and  the  filling  factor  should  be  n/(n  +  n^}  or 

«2/(«t  +  n2)  for  the  electric  field  perpendicular  or  parallel  to  the  grating  vector, 

respectively.  The  spectral  and  angular  responses  of  these  gratings  are  like  those  of 
single-homogencous-laycr  antireflection  coatings.  These  gratings  also  exhibit 
birefnngent  retardation. 

This  research  has  been  accepted  for  publication  in  Applied  Optics. 
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Section  of  the  ASEE,  1986. 

Thomas  K.  Gaylord  received  "Best  Research  Paper  in  Engineering"  award  for  paper  "Analysis 
and  applications  of  optical  diffraction  by  gratings"  from  Sigma  Xi,  1986. 


WORK  UNIT  NUMBER  4 


TITLE: 


Two-Dimensional  Optical/Electronic  Signal  Processing 
SENIOR  PRINCIPAL  INVESTIGATOR: 

W.T.  Rhodes,  Professor 
SCIENTIFIC  PERSONNEL: 

J.N.  Hereford,  Graduate  Research  Assistant  (Ph.D.  candidate) 
R.W.  Stroud,  Graduate  Research  Assistant  (Ph.D.  candidate) 


SCIENTIFIC  OBJECTIVE: 

The  long  term  scientific  objective  of  this  research  is  to  gain  a  good  understanding 
of  the  capabilities  and  limitations  of  hybrid  optical/electronic  methods  for  high 
throughput  processing  of  2-D  signal  information  and  to  develop  new  and  widely  applicable 
techniques  based  on  such  methods.  Emphasis  is  placed  on  establishing  the  capabilities  of 
systems  that  mate  well  with  digital  signal  processing  systems. 

RESEARCH  ACCOMPLISHMENTS: 

A.  Bipolar  Incoherent  Spatial  Filtering 

Our  original  objective  in  this  area  was  to  develop  effective  methods  for  bipolar 
spatial  filtering  using  incoherent  optical  systems  that  are  simple  to  implement  and 
efficient  with  respect  to  light  utilization.  That  objective  was  augmented  with  the 
additional  goal  of  maximizing  overall  system  dynamic  range  in  the  case  where  optical  and 
digital  subsystems  are  combined  with  a  scanning  operation  in  between. 

Work  in  1985  saw  completion  of  an  elegant  unifying  theory  of  pupil  function 
specification  for  the  two  kinds  of  two-pupil  hybrid  optical/electronic  spatial  filtering 
methods  (one  method  involving  interference  of  light  from  the  two  pupils,  the  other  method 
not).  Further,  a  two-step  algorithm  was  developed  for  designing  pupil  functions  that  arc 
optimal  or  nearly  so  in  the  sense  of  reducing  noise  and  enhancing  contrast.  During  the 
current  year,  three  journal  articles  on  the  research  have  been  published,  a  manuscript  for 
another  paper  has  been  submitted  for  publication,  and  a  paper  on  the  subject  was  also 
presented  at  a  conference. 

Currently  a  new  doctoral  student  is  looking  at  a  scheme  for  realizing  pupil  function 
syntheses  of  this  kind  using  birefringent  materials  in  common-path-interfcrometer-based 
imaging  systems.  This  approach  promises  to  eliminate  many  of  the  practical  difficulties 
characteristic  of  other  interferometer- based  systems. 


B.  Optical  Implementation  of  Morphological  Transformations 

In  late  1985  we  began  a  new  study  of  highspeed  opto-electronic  methods  for 
implementing  morphological  transformations  (e.g..  erosions,  dilations,  openings,  closings) 
on  binary  images  and  nonlinear  filtering  operations  (e.g.,  median  filtering)  on  gray-scalc 
images.  The  morphological  transformations,  which  involve  the  interaction  of  the  shape 
under  study  with  a  structuring  element,  are  being  used  more  and  more  in  pattern 
recognition  and  artificial  intelligence  for  robotic  vision.  Some  of  these  operations  are 
quickly  performed  using  binary  digital  logic  circuitry.  Others,  however  (e.g.,  skeleton 
decomposition,  where  the  pattern  under  study  is  reduced  to  primitive  structural 
components),  arc  sufficiently  complex  that  TV-frame-rate  processing  is  difficult  to 
achieve.  The  schemes  investigated  perform  the  necessary  operations  in  parallel.  Real¬ 
time  operation  is  contingent  on  finding  a  device  that,  much  like  high  contrast  film,  will 
perform  a  hard  limiting  operation  on  an  image,  but  essentially  instantaneously.  Such 
devices  are  currently  under  development  (e.g.,  at  GTE  laboratories),  and  the  methods  being 
investigated  appear  to  have  good  potential  for  success  in  a  real-time  environment. 

In  the  processing  operation,  a  binary  input  pattern  is  presented  to  a  shift-invariant 
imaging  system  and  a  blurring  operation  performed.  The  blur  function,  which  plays  the 
role  of  the  structuring  element  in  the  transformation,  is  often  a  disk,  though  other 
shapes  are  easily  accommodated.  The  resulting  blurred  image  is  then  hard  limited.  If  the 
threshold  is  set  for  a  high  intensity,  an  erosion  results;  with  a  low  intensity  threshold, 
a  dilation  results.  Setting  the  threshold  at  the  50%  level  results  (with  binary  input  and 
blur  functions)  in  a  median  operation.  Other  threshold  levels  can  be  used  to  obtain  other 
rank-order  processing  operations,  assuming  the  input  and  blur  functions  are  binary.  These 
simple  operations  have  been  successfully  demonstrated,  and  results  have  been  presented  at 
a  conference.  A  journal  publication  is  in  preparation. 

Continuing  work  in  this  area  centers  on  further  development  of  these  opto-electronic 
methods  and  their  extension  to  (a)  skeleton  decomposition  of  binary  images  and  (b) 
nonlinear  filtering  (e.g.,  median  filtering)  of  gray  scale  images.  We  have  worked  out 
conceptually  a  method  for  skeleton  decomposition  that  involves  integrating  the  results  of 
a  number  of  erosions  followed  by  implementation  of  a  Laplacian  or  similar  operation  and  a 
hard  limiting.  All  operations  can  be  performed  opto-electronically  in  parallel.  A  method 
for  nonlinear  processing  of  gray  scale  images  has  also  been  worked  out  and  is  under 
experimental  investigation. 

C.  Other  Image  Processing  Research 

Quasi  Space-Invariant  Imaging.  Contrary  to  widespread  belief,  a  single-lens  imaging 
system  with  limiting  aperture  in  the  lens  plane  is  not  space-invariant  for  coherent 
imaging  and  cannot,  therefore,  be  described  by  a  convolutional  input-output  relationship. 
Recently  we  conducted  an  experimental  verification  of  a  new  principle  of  quasi  space- 
invariance  that  applies  to  the  intensity  transmittance  of  diffuse,  coherently  illuminated 
objects  imaged  with  what  would  normally  be  a  space-variant  single-lens  imaging  system.  A 
brief  manuscript  presenting  the  principle  is  in  preparation.  ITic  quasi  space-invariance 
arises  because  of  the  whitening  of  the  object  spectrum  introduced  by  the  diffuser. 
Because  of  this  whitening,  all  regions  of  the  object  "see"  the  same  effective  transfer 
iunction  rather  than  substantially  different  ones.  The  principle  is  primarily  of  academic 
interest  because  of  the  speckle  that  is  introduced  in  the  image. 


Coherent  Optical  Processing  of  Incoherently  Illuminated  Objects.  Normally  it  is  not 
possible  to  perform  classical  coherent  spatial  filtering  operations  like  phase-contrast 
and  dark-field  imaging  on  objects  that  are  illuminated  incoherently  (e.g.,  by  light  from  a 
spatially  extended  incoherent  source).  We  have  recently  discovered  a  new  method  by  which, 
under  a  limited  set  of  conditions,  it  is  indeed  possible  to  do  coherent  spatial  filtering 
on  incoherently-illuminated  objects.  The  idea  was  First  considered  in  the  context  of 
phase -contrast  imaging  of  thermal  plumes  from  submarines  illuminated  by  noise  from  ocean 
waves.  Our  extension  is  to  the  optical  case.  A  key  restriction  is  that  the  object  be 
only  weakly  diffracting,  such  as  a  weak  phase  object.  We  are  preparing  to  verify  the 
concept  experimentally. 

Falling  Raster! Folded  Spectrum  Relationships.  We,  along  with  a  small  number  of  other 
researchers  (e.g.,  William  Stoner,  Harper  Whitehouse,  Terry  Turpin),  have  known  for  some 
time  that  alternatives  exist  to  the  conventionally- employed  falling  raster  format  for 
performing  2-D  spectrum  analysis  on  1-D  signal  information.  However,  these  alternatives 
have  never  been  modeled  analytically.  During  the  past  quarter  a  new  doctoral  student  has 
begun  the  task  of  detailing  the  basic  relationships  between  the  non-conventional  recording 
formats  and  the  corresponding  2-D  "folding"  of  the  signal  spectral  distributions.  It  is 
not  yet  clear  what  kind  of  payoff  we  can  expect  from  this  study,  though  the  resulting 
increased  flexibility  in  recording  format  may  be  of  use  in  some  high-speed  spectral 
analysis  applications. 

Low-Signal-to-Bias-Ratio  Optical  Signal  Processing.  Work  that  began  under  this 
program  in  connection  with  a  Fourier  transform  scanning  hybrid  image  processing  system  has 
evolved  to  center  on  the  general  problem  area  of  bias  buildup,  signal-to-bias  ratio,  and 
signal-to-noise  ratio  in  time-integration  optical  processing.  This  is  a  generic  problem 
that  affects  several  important  classes  of  optical  signal  processing,  including  time- 
integration  acousto-optic  processing  of  radar  and  communication  signals  and  incoherent 
holography. 

We  have  begun  a  conceptual  consolidation,  with  respect  to  bias  problems,  of  previous 
work  in  the  areas  of  time-integration  optical  signal  processing,  incoherent  holography, 
and  multiple-exposure  holography  and  are  developing  an  analytical  formulation  of  the 
general  problem  applicable  to  these  three  areas.  A  method  has  been  found  and  reported  for 
optimizing  small  signal-to-bias-ratio  exposures  for  time-integration  optical  processing 
where  bleached  silver-halide  recording  materials  are  used  as  the  detector.  Analytical 
details  were  reported  at  a  recent  conference.  We  are  currently  conducting  experimental 
verification.  Assuming  our  analytical  results  are  confirmed,  we  shall  submit  a  manuscript 
for  Journal  publication. 
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Journal  of  the  Optical  Society  of  America  A,  Vol.  3  (April  1986),  pp.  1826-1832. 
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Society  of  America  Topical  Meeting,  April  1986,  Honolulu,  pp.  27-30. 

2.  O’Neill,  Kirt  S.  and  Rhodes,  William  T.,  "Morphological  Transformations  by  Hybrid 
Optical-Electronic  Methods,”  in  Hybrid  Image  Processing,  D.  Casasent  and  A.  Tescher, 
eds.  (Proc.  SPIE,  vol.  638,  1986),  pp.  41-44. 

3.  Rhodes,  William  T.,  Stroud,  Robert,  and  Gaynor,  Edwin  S.,  "Maximizing  Diffraction 
Efficiency  of  Bleached  Time-Integration-Exposure  Silver-Halide  Holograms,"  in 
Holography  Technical  Digest  (Optical  Society  of  America,  1986),  pp.  96-99. 


Papers  Submitted  (for  Journal  Articles  or  Conference  Proceedings ) 

1.  Mait,  Joseph  N.  and  Rhodes,  William  T.,  ”A  pupil  function  design  algorithm  for 
bipolar  incoherent  spatial  filtering,”  submitted  to  Applied  Optics. 


Interactions  with  DoD  Labs 

Met  with  Dr.  A1  Ellinthorpe  of  DARPA/STO  on  30  May  1986. 

Met  with  Dr.  John  Lee  at  the  Naval  Research  Laboratory,  13  August  1986. 

Met  with  P.  Dcnzil  Stilwcll  and  Dr.  John  Lee  at  the  Naval  Research  Laboratory,  23-25 
September  1986. 

Met  with  Dr.  Jacques  Ludman  of  RADC/Hanscom  Field  on  18  December  1986. 
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TITLE: 


Optimal  Multiprocessor  Structures  for  the  Implementation  of  Digital  Signal 
Processing  algorithms  on  High  Density  Integrated  Circuits 

SENIOR  PRINCIPAL  INVESTIGATORS: 

D.  A.  Schwartz,  Assistant  Professor  and  J.  H.  Schlag,  Professor 

SCIENTIFIC  PERSONNEL: 

H.  Forrcn,  (Ph.D.  candidate) 

SCIENTIFIC  OBJECTIVE: 

To  develop  techniques  for  the  automatic  generation  of  optimal  or  highly  efficient 
implementations  of  digital  signal  processing  algorithms  for  synchronous  multiprocessor 
VLSI  architectures. 


RESEARCH  ACCOMPLISHMENTS: 

A.  Generalization! Extension  of  Cyclo-Static  Scheduling 

Work  performed  in  conjunction  with  H.  Forrcn,  a  research  assistant,  has  yielded 
generalizations  and  extensions  to  cyclo-static  scheduling  as  well  a  rudimentary  proposal 
for  a  basic  VLSI  processor  node. 

The  method  of  cyclo-static  scheduling  has  been  generalized  or  extended  in  the 
following  areas:  1)  inhomogeneous  processors;  2)  pipelined  processing  elements;  3) 
hierarchical  nodes;  5)  a  communications  bound;  and  6)  Gantt  chart  permutations. 

Previously  only  homogeneous  processors  were  considered.  For  many  DSP  problems  such 
as  filtering,  only  the  atomic  operations  of  multiplication  and  addition  are  needed.  When 
the  processing  elements  are  restricted  to  homogeneous  processors  even  though  processor 
utilization  may  be  optimal,  the  system  efficiency  is  lower  since  the  processor  must  be 
capable  of  both  multiplying  and  adding.  This  usually  implies  that  the  utilization  within 
the  processor  node  is  sub-optimal.  This  is  particularly  important  when  considering  algo¬ 
rithmically  specialized  (custom)  processors  in  the  VLSI  context.  For  efficient  utiliza¬ 
tion  of  silicon  area  a  design  based  on  simple  multiplier  and  adder  elements  is  more 
effective  than  a  general  purpose  processor  that  must  include  complex  control  and  storage 
and  ALU.  The  previously  reported  lower  bound  on  the  number  of  processors  can  be  directly 
extended  by  partitioning  operations  into  a  set  of  operations  associated  with  each  proces¬ 
sor  type.  This  can  also  be  used  to  model  inhomogeneous  communications  delay  since  the 
model  normally  binds  communication  delay  to  the  producing  operator. 

Pipelined  processing  elements  are  attractive  at  both  the  discrete  and  VLSI  clement 
level.  This  is  because  pipelining  usually  yields  the  lowest  cost  approach  to  concurrency. 
Pipelined  processor  elements  can  also  be  handled  by  a  minor  extension  to  the  previously 
reported  bounds.  The  iteration  period  bound  and  the  delay  bound  arc  now  computed  in 


terms  of  the  pipeline  latency  instead  of  the  operational  latency.  Other  expressions  such 
as  those  for  the  processor  lower  bound  and  for  the  processor  modulo  constraint  must  use 
the  stage  latency.  While  the  extensions  are  very  simple,  they  are  also  very  powerful  and, 
in  addition,  have  been  successfully  applied  as  a  tool  for  writing  optimal  code  for  pipe¬ 
lined  array  processors  and  pipelined  DSP  chips. 

Hierarchical  graplis  can  be  used  to  reduce  the  apparent  complexity  of  the  graph  to  the 
point  where  the  optimization  tools  are  effective.  Further  modifications  of  previous 
expressions  must  be  made  in  order  to  allow  the  use  of  hierarchical  nodes.  Instead  of 
associating  a  single  latency  with  the  time  from  the  arrival  of  the  last  input  to  the 
appearance  of  the  single  output,  an  independent  latency  must  be  associated  with  each  input 
or  output  (where  multiple  outputs  must  be  considered  as  well). 

A  new  bound  on  communications  has  been  determined.  This  communications  bound 
reflects  the  local  characteristics  of  a  homogeneous  communications  network  required  to 
implement  a  signal  flowgraph.  The  number  of  neighbors  with  which  a  processor  can  directly 
communicate  (including  itself)  is  an  important  measure  of  network  complexity.  By  obser¬ 
ving  the  worst  case  fan-in  or  worst  case  fan-out  of  a  signal  flowgraph,  as  well  as  the  pad 
time  allowed  along  the  respective  branches,  the  direct  neighbor  lower  bound  can  be  estab¬ 
lished. 

In  prior  work,  only  adjacent  processor  communications  were  considered  in  scheduling 
and  mapping  a  solution  onto  a  target.  For  an  architecture  that  supports  parallel  communi¬ 
cations  and  operations  it  is  practical  to  pass  data  through  more  than  one  processor  when 
the  schedule  has  sufficient  slack  time  between  the  availability  of  the  data  and  the  need 
to  use  it.  From  a  valid  schedule,  the  exact  communication  requirements  of  that  schedule 
may  be  found  directly.  Simple  formulas  indicate  the  number  of  communication  steps 
available  for  data  to  travel  from  one  processor  to  another.  The  distance  between  these 
processors  in  terms  of  a  network  model  is  similarly  found.  If  all  chords  of  a  network  (if 
any)  are  shorter  than  this  specification,  the  data  can  not  travel  far  enough  in  the  time 
allowed.  Defining  the  local  radius  of  the  network  to  be  the  greatest  distance  that  data 
can  travel  in  one  communications  step  simplifies  the  process  of  determining  a  valid  (if  it 
exists)  processor  mapping  given  a  schedule.  These  preliminary  results  may  form  the  basis 
of  an  effective  scheduler  that  can  determine  the  minimum  possible  communications  support 
for  a  given  FSFG/algorithm. 

The  Gantt  Chart  has  been  presented  as  a  useful  representation  of  a  schedule  or 
program.  Numerous  manipulations  to  the  Gantt  Chart  allow  one  schedule  or  program  to  be 
transformed  into  another  trivially  different  one.  In  the  case  of  the  program,  a  dramatic 
difference  in  the  communications  network  may  result.  A  brief  list  of  manipulations  is 
presented  here. 

Columns  may  be  arbitrarily  rotated,  effecting  only  a  relative  time  shift. 

Rows  may  be  arbitrarily  switched,  effecting  which  processor  does  which  row. 

Row  fragments  may  be  sliced  and  spliced  so  long  as  there  is  no  time  shift. 

Any  two  (or  all)  cycles  may  be  spliced  into  a  single  cycle. 


An  important  result  of  manipulating  Gantt  Charts  is  the  ability  to  transform  any 
schedule  or  program  into  SSIMD  form  with  homogeneous  processors.  The  homogeneous  processor 
requirement  is  obvious  for  SSIMD.  If  inhomogeneous  processors  arc  required,  then  cycles 
may  be  spliced  until  a  single  cycle  exists  for  each  type  of  operation  or  processor.  If 
the  number  of  rows  in  each  cycle  is  the  same,  then  the  result  is  a  PSSIMD  solution.  If 
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the  number  of  rows  in  any  two  cycles  differ,  then  the  result  is  a  multiple  cyclo-static 
instruction  streams,  multiple  data  streams  (MCIMD)  solution.  [MCIMD  is  an  extension  of 
cyclo-static  to  allow  multiple  lattice  vectors,  which  are  often  necessary  for 
inhomogeneous  solutions]. 

The  subject  of  the  following  section  is  the  result  of  work  performed  in  conjunction 
with  P.  W.  Hutto,  a  graduate  student.  Hutto  was  supported  by  with  funds  from  DARPA  and 
the  State  of  Georgia  (E-Funds).  His  advisor,  D.  A.  Schwartz  was  supported  in  this  re¬ 
search  by  JSEP.  Progress  and  preliminary  results  has  been  made  in  the  following  areas. 


B.  Optimal  Data  Routing 

Based  on  the  theoretical  properties  of  cyclo-static  realizations,  DARPA  sponsored  the 

design  of  a  small  scale  model  of  a  super  computer  for  digital  signal  processing  which  is 

referred  to  as  OSCAR  (Optimal  Synchronous  Cyclo-static  ARray).  OSCAR  features  a 
generalized/extended  ILLLAC-IV  type  interconnection  network.  The  ILLLAC-IV 
interconnection  network  was  a  simple  SIMD  machine  that  required  each  node  to  execute  the 
identical  routing  instruction.  OSCAR  allows  independent,  arbitrary,  five  port 

interconnection  at  each  node.  In  addition  the  interconnection  network  operates  in 

parallel  with  processor  node  computations.  This  powerful  interconnection  network  is  the 
backbone  of  OSCAR’s  fine  grain  parallelism.  However,  for  those  algorithms  of  interest 
that  do  not  meet  the  model  (cyclo-static)  of  the  existing  compiler  tools  (e.g.  FFT),  the 
problem  of  optimal  data  routing  for  implementations  where  data  must  be  passed  through  one 
or  more  processor  nodes  to  reach  its  destination  is  largely  unsolved.  Recent  research  has 
only  addressed  hypercube  routing  and  restricted  cases  of  ILLIAC-IV  type  meshes. 

Spurred  by  problems  in  determining  an  optimal  routing  algorithm  for  a  parallel  FFT,  a 
software  testbed  was  developed  to  investigate  optimal  routing  for  generalized  many-to-many 
data  permutations.  An  optimization  technique  based  on  simulated  annealing  was  developed 
to  determine  optimal/ncar  optimal  routes.  The  method  is  based  on  assigning  minimum  length 
paths  to  all  data  paths,  then  annealing  the  paths  by  introducing  delays  and  path  permuta¬ 
tions  so  as  to  eliminate  all  communications  conflicts. 

Preliminary  results  indicate  that  the  method  is  very  efficient  at  producing 

optimal/near  optimal  data  routes.  The  viability  of  the  the  simulated  annealing  technique 

was  suggested  by  the  success  of  the  method  for  optimizing  layout  routes  for  VLSI. 

C.  Differential  Equation  Compiler 

The  numerical  solution  to  differential  equations  is  a  very  common,  CPU  intensive 
application  of  scientific  computing.  In  particular  there  has  been  great  interest  in  the 

real  time  solution  of  the  equations  of  motion  for  robot  arms.  It  was  noted  that  the 

solution  to  initial  value  problems  that  do  not  use  data-adaptive  integration  techniques 
are  precisely  the  class  of  problems  that  can  be  optimally  solved  with  cyclo-static 
techniques.  However  the  resulting  FSFG's  arc  complex  and  irregular.  To  provide  a  more 
effective  environment  a  simple  compiler  was  developed  to  demonstrate  the  feasibility  of 
using  a  high  level,  FORTRAN-like,  language  to  specify  the  systems  of  differential  equa¬ 
tions.  The  output  of  the  compiler  is  a  list  that  specifies  the  resulting  FSFG.  The 
current  implementation  is  rather  frail  and  needs  to  be  strengthened  to  solve  a  larger 
class  of  equations,  and  to  use  more  advanced  solution  techniques.  This  was  only  intended 
as  a  feasibility  study  and  it  is  not  planned  to  pursue  this  area  further  at  this  time. 


D.  Off  Chip  Interface  Issues 


As  a  result  of  developing  the  OSCAR  architecture  several  results  were  discovered 
relating  to  communications  design  and  off-chip  interfaces  in  fine  grain  parallel 
architectures.  It  is  unlikely  that  it  will  be  possible  to  implement  a  high  performance 
multiprocessor  system  on  a  single  chip  in  the  near  future.  Therefore  the  off  chip 
interface  is  crucial  in  the  design  process.  There  is  always  a  large  speed  penalty  for 
going  off  chip.  In  the  cyclo-static  multiprocessor  model  it  had  be  :n  assumed  that  the 
interchip  delay  was  negligible  compared  to  the  intrachip  computation  time  (for  an  atomic 
operation).  During  the  course  of  the  past  three  years  this  has  changed  significantly. 

Two  methods  have  been  developed  to  handle  this  problem.  First,  if  the  interchip  delay 
is  comparable  and  less  than  (or  equal  to)  the  operational  delay,  then  the  interchip 
communications  time  can  be  effectively  pipelined  by  using  an  interleaved  design.  This 
allows  all  communications  to  be  treated  homogeneously  and  maximizes  the  system  throughput. 
The  penalties  are  that  this  requires  partitioning  (or  cooperation)  of  system  resources 
between  the  tasks,  the  existence  of  orthogonal  tasks,  increased  throughput  delay,  and 
potential  problems  with  synchronizing  tasks. 

The  second  method  is  to  treat  the  interchip  vs.  intrachip  communications  as  an 
inhomogeneous  communications  medium.  In  the  cyclo-static  system  model  it  may  be  possible 
to  assign  all  operations  in  a  critical  loop  to  a  single  chip  and  to  split  non-recursive 
sections  and  non-critical  loops  with  sufficient  slack  time  across  chip  boundaries  by 
spending  the  slack  time  on  the  communications  delay.  Simple  examples  have  demonstrated 
the  feasibility  of  this  approach,  however  a  formal  method  to  handle  inhomogeneous  communi¬ 
cations  is  in  the  early  stages  of  development  and  will  require  future  work. 

Both  methods  seem  to  indicate  that  for  technologies  such  as  GaAs  where  the  ratio  of 
interchip  to  intrachip  delay  is  so  large  that  fine  grain  parallelism  across  chip 
boundaries  is  impractical.  This  suggests  a  hierarchical  approach  of  fine  grain 
parallelism  for  intrachip  scheduling  and  medium  to  coarse  grain  parallelism  for  interchip 
scheduling. 

A  second  issue,  closely  related  to  the  previous  issue  is  communication  bandwidth.  To 
achieve  fine  grain  parallelism  a  very  large  communications  bandwidth  is  required.  Consi¬ 
der  the  case  of  all  atomic  operations  being  simple  binary  operators  (two  inputs,  one 
output).  For  each  operation,  potentially,  two  operands  must  be  fetched  from  other  proces¬ 
sor  elements.  If  the  communications  time  is  to  be  negligible  compared  to  the  computation 
time  this  implies  approximately  an  order  of  magnitude  more  communications  bandwidth  then 
used  in  conventional  design  wisdom.  In  order  to  make  it  practical  to  find  optimal/good 
schedules,  at  present  it  is  necessary  to  insure  that  the  communications  is  non-blocking. 
This  can  only  be  achieved  by  supporting  an  even  higher  communications  bandwidth  (which  is 
a  function  of  the  algorithm  and  the  target  communications  network).  This  does  not  imply 
that  fine  grain  parallelism  is  flawed.  What  is  does  imply  is  that  for  a  system  to  be  able 
to  solve  a  class  of  problems  efficiently  with  large  parallelism  that  there  is  a  need  for 
large  communications  bandwidth. 

In  short,  powerful,  high  bandwidth  communications  is  fundamental  to  fine  grain 
parallelism.  Much  more  work  needs  to  done  on  the  fundamental  architectual  implications  of 
communications  requirements  on  the  off  chip  interface.  It  is  also  obvious  that  fine  grain 
parallelism  carries  with  it  architectual  complexities  that  may  be  better  addressed  by 
applying  the  fine  grain  approach  and  theory  on  a  coarser  grain  (herein  lies  current 
applicability). 
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SCIENTIFIC  OBJECTIVE: 

The  broad  objective  of  this  research  is  to  develop  new  methodology  for  making 
electromagnetic  measurements  directly  in  the  time  domain  or  over  a  wide  bandwidth  in  the 
frequency  domain.  This  research  includes  the  development  of  the  theoretical  analyses 
necessary  to  support  the  measurement  techniques.  One  aspect  of  the  research  is  the 
systematic  study  of  radiating  structures  placed  near  or  embedded  in  material  bodies.  In  a 
practical  situation,  the  radiator  might  serve  as  a  diagnostic  tool  for  determining  the 
geometry,  composition  or  electrical  constitutive  parameters  of  the  body. 


RESEARCH  ACCOMPLISHMENTS: 

During  the  last  year,  research  was  initiated  on  the  following  new  topics. 

A.  Pulse  Excited  Antennas  Near  a  Material  Interface 

Ground  penetrating  radar  systems  have  been  proposed  for  many  applications;  these 
include  the  detection  of  mines  and  buried  unexploded  ordinance,  the  location  of  buried 
utilities  and  the  mapping  of  subsurface  geological  structure.  The  systems  generally  make 
use  of  a  temporarily  short,  wide  bandwidth,  pulse.  The  pulse  is  transmitted  and  received 
by  one  or  more  antennas  located  above  the  surface  of  the  earth. 

The  characterization  and  design  of  antennas  for  this  application  are  complicated  by 
two  factors:  the  broad-band  requirements  and  the  effect  of  the  air/earth  interface  on  the 
performance  of  the  antenna.  Two  types  of  antennas  have  generally  been  used  for  systems  of 
this  kind:  dipole  type  antennas  located  very  close  to  the  air/earth  interface  and  TEM 
horn  antennas  located  at  a  larger  distance  from  the  air/earth  interface. 

We  have  initiated  a  research  program  to  study  new  antennas  for  this  application. 
Experimental  facilities  were  constructed  for  measuring  the  performance  of  antennas  over  an 
interface  between  air  and  wet  sand.  Sensors  and  targets  embedded  in  the  sand  are  used  to 
evaluate  the  field  of  the  antenna. 


Scale  model  antennas  are  used  with  this  system;  full  size  antennas  would  generally 
require  larger  test  facilities  than  can  be  accommodated  in  our  laboratory.  Techniques 
have  been  developed  for  correcting  errors  in  the  automated  time-domain  instrumentation 
used  with  this  system. 

B.  Materials  for  Electromagnetic  Scale  Models 

Electromagnetic  scale  models  are  of  value  in  the  design  and  testing  of 
electromagnetic  systems  that  are  on  a  scale  too  large  or  too  small  for  routine  laboratory 
use.  For  example,  the  performance  of  antennas  over  or  buried  in  the  earth  can  be 
evaluated  in  the  laboratory  using  reduced  size  scale  models  operated  at  frequencies  higher 
than  those  used  for  the  actual  antennas.  For  this  model,  a  material  is  needed  that  will 
model  the  electrical  properties  of  the  earth.  Scale  models  with  increased  size  are  also 
used.  A  microscopic  antenna  on  a  substrate  may  be  modeled  by  a  larger  antenna  operated  at 
a  frequency  lower  than  that  for  the  actual  antenna.  A  material  is  then  needed  that  can 
model  the  electrical  properties  of  the  substrate. 

Clearly  it  is  desirable  to  have  a  series  of  materials  with  a  range  of  electrical 
properties  or  mixtures  of  materials  with  adjustable  electrical  properties  for  use  in  scale 
models.  We  are  beginning  a  study  of  the  electrical  properties  of  emulsions  (mixtures  of 
oil  and  water)  as  binary  solutions  with  adjustable  electrical  properties.  The 
constituents  of  the  emulsion  are  relatively  inert  and  satisfy  all  of  the  requirements  as 
to  toxicity,  flammability,  etc.  We  started  a  systematic  investigation  of  the  electrical 
properties  of  several  emulsions,  with  the  objective  of  obtaining  information  on: 

a.  the  range  of  electrical  parameters  that  can  be  realized, 

b.  The  frequency  dependence  of  the  electrical  properties, 

c.  the  accuracy  of  mixing  formulas  for  predicting  the  electrical 
properties, 

d.  the  stability  of  the  emulsions. 
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Papers  in  Conference  Proceedings 
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Interaction  with  DoD  Lubs 

During  the  period  of  the  contract,  a  study  entitled  "Hardened  Antenna  Technology 

Assessment"  was  performed  for  the  Air  Force  (RADC,  Griffis  AFB).  This  study  made  use  of 

information  developed  on  the  Joint  Services  Electronics  Program. 
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SCIENTIFIC  OBJECTIVE: 

The  long  term  objective  of  this  research  is  to  understand  the  near  field  and  far 
field  coupling  between  antennas  in  the  presence  of  scatterers.  Special  emphasis  is  placed 
on  determination  of  limits  of  accuracy  in  the  measurement  of  the  fields  radiated  or 
scattered  by  an  antenna- under-test  by  a  second  antenna  and  to  develop  techniques  and 
computer  algorithms  for  compensation  of  such  measurements  due  to  known  geometrical  or 
electromagnetic  anomalies. 

Three  application  areas  are  pursued:  a)  antenna  measurements,  where  the  effects  of 
scatterers  are  suppressed  or  compensated;  b)  scattering  measurements,  where  the  effects 
of  scatterers  are  enhanced  and  c)  radome  measurement,  where  the  effects  of  the  scattercr 
(the  radome)  are  of  equal  importance  to  the  antenna  measurement. 


RESEARCH  ACCOMPLISHMENTS: 

A.  Spectral  Evaluation  of  Reflector  Surfaces  Used  for  Compact  Antenna  Ranges 

The  test  zone  field  quality  of  a  compact  antenna  range  is  dependent  on  primarily 
three  factors:  the  surface  accuracy  and  smoothness  of  the  range  reflector,  the  scattering 
and  diffraction  from  the  edge  of  the  reflector  and  the  complex  vector  far  field  pattern  of 
the  feed  antenna.  It  was  shown  that  the  effect  of  the  surface  accuracy  and  smoothness  can 
be  quantified  using  a  newly  developed  spectral  analysis  technique.  The  surface  error  is 
expressed  as  a  two-dimensional  summation  of  sinusoids.  The  illuminating  field  is 
expressed  as  a  two-dimensional  spectrum  of  plane  waves  incident  on  the  reflector.  It  is 
shown  that  the  effect  of  the  reflector  surface  error  is  to  phase  modulate  each  plane  wave 
with  a  modulation  index  proportional  to  the  magnitude  of  the  sinusoidal  components  of  the 
surface  error.  The  compact  range  geometry  was  shown  to  be  a  low-pass  filter,  allowing 
only  those  plane  waves  with  spatial  period  of  approximately  three  wavelengths  and  below  to 
pass  to  the  quiet  zone.  Thus  it  was  learned  which  reflector  surface  accuracy  were 


important  for  compact  range  use  and  the  relationship  between  surface  accuracy  and  test 
zone  field  performance.  These  results  were  presented  at  the  1986  Antenna  Measurement 
Techniques  Association  Workshop  on  Compact  Ranges  and  presented  at  the  1986  Antenna 
Measurement  Techniques  Association  Meeting. 

B.  Near-Field  Measurement  of  Radome  Performance 

Initial  work  has  been  carried  out  to  develop  a  technique  of  near-field  measurements 
of  radome  performance.  Spherical  surface  near-field  measurements  have  been  made  of  a 
fused  silica,  tangent  ogive,  missile  radome  at  11.75  GHz.  A  special  antenna  was  designed 
and  constructed  and  an  existing  radome  was  used  for  the  measurement.  A  "backward 
transform"  algorithm  is  concurrently  under  development  to  determine  the  fields  on  the 
outer  surface  of  the  radome  from  the  measured  near-field  data.  Backward  transforming 
using  a  spherical  wave  expansion  is  both  non-unique  and  unstable.  The  higher  order  modes, 
which  must  be  allowed  in  the  solution,  increase  rapidly  as  the  radius  of  the  back 
projection  decreases.  Preliminary  results  show  that  sufficient  information  is  contained  in 
the  radiating  modes  and  little  is  lost  in  setting  the  higher  order  mode  amplitudes  to 
zero.  Papers  describing  the  near-field  measurements  made  with  and  without  a  radome  and 
the  results  of  the  initial  work  on  radome  performance  assessment  using  this  new 
measurement  technique  were  presented  at  the  18th  Symposium  on  Electromagnetic  Windows  and 
the  1986  Antenna  Measurement  Techniques  Association  Meeting. 

This  work,  plus  other  publications  and  technology  transfers  of  the  principal 
investigator  are  summarized  in  the  following: 
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1.  Effenberger,  J.  A.,  Strickland,  R.  R.,  and  Joy,  E.  B.,  "The  Effects  of  Rain  on  a 
Radome’s  Performance,"  Microwave  Journal,  Vol.  29,  No.  5,  May  1986,  pp.  261-274. 
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Ontario,  September  23-25,  1986. 
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Measurement  of  Radome  Performance,"  Proceedings  of  the  Eighteenth  Symposium  on 
Electromagnetic  Windows,  Atlanta,  Georgia,  September  17-19,  1986. 

4.  Joy,  E.  B.,  Wilson,  R.  E.,  Effenberger,  J.  A.,  Punnett,  M.  B.,  and  Strickland,  R., 

"The  Electromagnetic  Effects  of  Water  on  the  Surface  of  a  Radome,"  Proceedings  of  the 
Eighteenth  Symposium  on  Electromagnetic  Windows,  Atlanta,  Georgia,  September  17-19, 
1986. 
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Technology  Transfer 

1.  U.  S.  Navy:  M.  I.  T.  Lincoln  Laboratory  is  developing  a  high  performance 
surveillance  radar  system  which  incorporates  a  shipboard  mounted  ultra-low  sidclobc 
planar  phased  array  antenna.  The  array  has  a  5-meter  by  10-meter  aperture  and  a  -55 
dB  rms  sidclobc  levels.  M.I.T.  Lincoln  Laboratory  will  install  a  multi-million 
dollar  plane-polar  near-field  measurement  system  to  test  and  align  the  phased  array. 
To  achieve  the  desired  measurement  and  alignment  accuracies,  the  K-corrcction  probe 


position  error  compensation  technique  and  algorithm  developed  under  JSEP  sponsorship 
will  be  transferred  to  M.I.T.  Lincoln  Laboratory.  (The  request  for  proposal  on  the 
facility  specifies  use  of  this  technique.) 

2.  U.  S.  Army  Electronic  Proving  Ground,  Fort  Huachuca,  Arizona:  The  spectral  analysis 
technique,  for  the  evaluation  of  reflector  surfaces  developed  under  JSEP  sponsorship 
is  currently  being  used  in  the  design  of  a  large  outdoor  compact  antenna  measurement 
range  at  Fort  Huachuca. 


